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Abstract — This paper presents a new feedback shift register- 
based method for embedding deterministic test patterns on-chip 
suitable for complementing conventional BIST techniques for in- 
field testing. Our experimental results on 8 real designs show that 
the presented approach outperforms the bit-flipping approach by 
24.7% on average. We also show that it is possible to exploit the 
uneven distribution of don't care bits in test patterns in order to 
reduce the area required for storing deterministic test patterns 
more than 3 times with less than 2% fault coverage drop. 

Index Terms — BIST, top-off test patterns, feedback shift reg- 
ister, NLFSR, in-filed testing. 

I. Introduction 

Large test data volume is widely recognized as a major 
contributor to the testing cost of integrated circuits |l j. The 
test data volume in 2017 is expected to be 10 times larger 
than the one in 2012 0. On the contrary, the size of the 
Automatic Test Equipment (ATE) memory is expected to grow 
only twice 13. 

A number of efficient on-chip test compression techniques 
have been proposed as a solution for reducing ATE memory 
requirements, including 0, 0, 0, 0, 0. A test set for 
the circuit under test is compressed to a smaller set, which is 
stored in ATE memory. An on-chip decoder is used to generate 
the original test set from the compressed one during test 
application. Test compression has already established itself as 
a mainstream design-for-test methodology for manufacturing 
testing 0. However, it cannot be used for in-field testing 
where ATE is not available Q. 

For in-field testing, Built-in Self Test (BIST) including 
use of JTAG is applied, in which either pseudo-random test 
patterns are generated within the system or pre-computed 
deterministic test patterns are stored in system memory Q. In 
terms of test application time and fault coverage, deterministic 
test patterns are obviously more effective than pseudo-random 
ones. The fault coverage achieved with pseudo-random test 
patterns can be as low as 65% 0. Several methods for 
increasing BIST test coverage have been proposed, including 
modification of the circuit under test 0, insertion of control 
and observe points into the circuit 0, modification of the 
LFSR to generate a sequence with a different distribution of 
0s and Is [10], embedding of deterministic test patterns into 
LFSR's patterns by LFSR re-seeding [11 1 or bit-flipping [12|, 
or storing them in an on-chip memory [13|. The idea of 
complementing pseudo-random patterns with deterministic 
patterns is particularly attractive because the deterministic 



patterns can also solve the problem with transition or delay 
faults which are not handled efficiently by the pseudo-random 
patterns. However, the area required to store deterministic 
test patterns within the system can be prohibitively high. For 
example, the memory required to store them may exceed 30% 
of the memory used in a conventional ATPG approach fl4l . 

In this paper, we propose a new method for embedding 
deterministic test patterns on-chip suitable for complement- 
ing conventional techniques for in-field testing. We generate 
deterministic test patters using a structure known as binary 
machine. This name was introduced by S. Golomb in his 
seminal book [15|. Binary machines can be considered as a 
more general type of Non-Linear Feedback Shift Registers 
(NLFSRs) ifTBI in which every stage is updated by its own 
feedback function. 

Binary machines are typically smaller and faster than NLF- 
SRs generating the same sequence. For example, consider the 
4-stage NLFSR with the feedback function 

f(xo,Xl,X2,X3) =X ©^3©^1 -X 2 ®X 2 -X3, 

where "©" is the XOR (addition modulo 2), "•" is the AND, 
and Xi is the variable representing the value of the stage 
i, ie {0,1,2,3}. If this NLFSR is initialized to the state 
(xQXiX2Xi) = (1000), it generates the output sequence 

(1,0,0,0,1,1,0,1,0,1,1,1,1,0,0) (1) 

with the period 15. The same sequence can be generated by 
the 4-stage binary machine with the feedback functions 

/3(xo,x 3 ) = x ®x 3 

f 2 (x u x 2 ,x 3 ) = x 3 ®x l -x 2 

fl(X2) = x 2 

fo(xi) = X\. 

We can see that the binary machine uses 3 binary operations, 
while the NLFSR uses 5 binary operations. Furthermore, the 
depth of feedback functions of the binary machine is smaller 
than the depth of the feedback function of the NLFSR. Thus, 
the binary machine has a smaller propagation delay than the 
NLFSR. 

While binary machines can potentially be smaller and 
faster than NLFSRs, the search space for finding a best 
binary machine for a given sequence is much larger than the 
corresponding one for NLFSRs. Algorithms for constructing 
binary machines were presented in ifTTl . (T8|. Both algorithms 
result in binary machines with the minimum number of stages 
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Fig. 1. The general structure of an n-stage binary machine. 



for a given binary sequence. However, they do not minimize 
the circuit complexity of feedback functions. For Finite State 
Machines (FSM), it is known that an FSM with a non-minimal 
number of stages, e.g. encoded using one-hot encoding, often 
has a smaller total size than an FSM with a minimal number 
of stages [19|. 

In this paper, we present an algorithm with constructs 
binary machines with a non-minimal number of stages. Our 
experimental results show that binary machines constructed 
by the presented algorithm are 63.28% smaller on average 
compared to the one constructed by the algorithm |18|. The 
presented algorithm is particularly efficient for incompletely 
specified sequences, which are important for testing. 

The rest of the paper is organized as follows. Section[rT| gives 
an introduction to binary machines. Section [IV] describes the 
new algorithm for constructing binary machines. Section [V] 
presents the experimental results. Section VI concludes the 
paper and discusses open problems. 



II. Binary Machines 

An n-stage binary machine consists of n binary storage 
elements, called stages [15|. Each stage i € {0, 1, ...,«— 1} 
has an associated state variable x, e {0,1} which represents 
the current value of the stage i and a feedback function 
fi : {0,1}" — > {0,1} which determines how the value of xi 
is updated (see Figure [T]). 

A state of a binary machine is a vector of values of its 
state variables. At every clock cycle, the next state of a binary 
machine is determined from its current state by updating 
the values of all stages simultaneously to the values of the 
corresponding feedback functions. An n-stage binary machine 
has 2" states corresponding to the set {0,1}" of all possible 
binary n-tuples. 

The degree of parallelization of an n-stage binary machine, 
k, is the number of output bits generated at each clock cycle, 
1 <k<n. 

The dependence set of a Boolean function / : {0,1}" — > 
{0, 1} is defined by 

dep(f) = {j\f(X)\ X]=0 ^f(X)\ Xj=l }, 



where f(X)\ XJ=k = f(x ,...,Xj-i,k,Xj+l,---,x n -l) for k G 
{0,1}. 

The Algebraic Normal Form (ANF) ll20l of a Boolean func- 
tion / : {0,1}" — > {0,1} (also called Reed-Muller canonical 
form f2~T\ ) is an expression in the Galois Field or order 2, 
GF(2), of type 
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where c, g {0, 1} are constants and (z'oii . . . i n -i) is the binary 
expansion of i. 

III. Related Work 



The first algorithm for constructing a binary machine with 
the minimum number of stages for a given binary sequence 
was presented in [17|. This algorithm exploits the unique 
property of binary machines that any binary n-tuple can be 
the next state of a given current state. The algorithm assigns 
every of a sequence a unique even integer and every 1 
of a sequence a unique odd integer. Integers are assigned 
in an increasing order starting from 0. For example, if an 
8-bit sequence 00101101 is given, the sequence of integers 
0,2,1,4,3,5,6,7 can be used. This sequence of integers is 
interpreted as a sequence of states of a binary machine. The 
largest integer in the sequence of states determines the number 
of stages. In the example above, |~log 2 7] = 3, thus the resulting 
binary machine has 3 stages. The feedback functions /c/1,/2 
implementing the resulting current-to-next state mapping are 
derived using the traditional logic synthesis techniques ll22l . 

Note that, in general, any permutation of integers can be 
used as a sequence of binary machine's states, as long as the 
selected integer modulo 2 is equal to the corresponding bit 
of the output sequence. Different state assignments result in 
different feedback functions. The size of these functions may 
vary substantially. 

In 1181 . the algorithm |[T7l was extended to binary machines 
generating k bits of the output sequence per clock cycle. 
The main idea is to encode a binary sequence into an m- 
ary sequence which can be generated in a simpler way. As an 
example, suppose that we use the 4-ary encoding (00) = 0, (01) 
= 1, (10) = 2, (1 1) = 3 to encode the binary sequence 00101 101 
from the example above into the quaternary sequence 0231. 
Then, we can construct a parallel binary machine generating 
00101101 2-bits per clock cycle with a sequence of states 
0, 2, 3, 1. Note that |~log 2 3] =2, so the resulting parallel 
binary machine has one stage less than the binary machine 
constructed above. This is surprising taking into account that 
all existing techniques for the parallelization of LFSRs |23|, 
11241 and NLFSRs 1251 . |26| have area penalty. In was shown 
in |fl8l that, for random sequences, parallel binary machines 
can be an order of magnitude smaller than parallel LFSRs or 
NLFSRs generating the same sequence. 



IV. Synthesis of binary machines 

The problem of finding a best binary machine for a given 
sequence can be divided into three sub-problems: 

1) Selecting an optimal degree of parallelization for a given 
binary sequence. 

2) Choosing an optimal state assignment for a given degree 
of parallelization. 

3) Finding a best circuit for feedback functions for a given 
state assignment. 

A. Optimal degree of parallelization 

The degree of parallelization determines how many output 
bits are generated per clock cycle. The size of binary machines 
may differ substantially for different parallelization degrees. 
The degree of parallelization is optimal if it minimizes the 
size of the resulting binary machine. 

In order to construct a binary machine with the degree of 
parallelization p, we map a binary sequence into an 2 /;l -ary 
sequence by partitioning the binary sequence into vectors of 
length p. The resulting vectors are treated as binary expansions 
of elements of an 2 p -ary sequence. The same approach was 
used in lfl8l . 

Let us denote by Nj the number of occurrences of a digit i 
in the 2 p -ary sequence, < i < 2 P . Let N max be the largest of 
Nj. In |[l"8l . it was shown that the minimum number of stages 
in a binary machine generating a given binary sequence with 
the degree of parallelization p is equal to 

k=\log 1 N max ]+p. (2) 

From |2} we can see that if N max = 1, then k — p. Such 
a case is called full parallelization. On the base of our 
experimental results, we hypothesise that the optimal degree 
of parallelization belongs to the interval 

i <p 0P t < r io g2«i ( 3 ) 

where n is the sequence length. 

Note that for some applications, including testing, the 
degree of parallelization is specified by the user. For example, 
for testing it is equal to the number of scan chains. 

B. Optimal state assignment 

A state assignment determines a sequence of states which 
a binary machine follows. Different sequences of states give 
raise to different current-to-next state mappings and, thus, to 
different updating functions. The state assignment is optimal 
if it minimizes the size of the resulting binary machine. 

Since a binary machine is a deterministic finite state au- 
tomaton, any current state has a unique next state. For a given 
2 p -ary encoding, the minimal number of bits which has to be 
added to /^-tuples to make the current-to-next state mapping 
unique is \log2N max ] . The minimal number of stages in the 
resulting binary machine is given by Q. 

The strategy for state assignment presented in this paper has 
two major differences from the one in lfT8l . First, we use a 
non-minimal number of stages, namely 

k>\log 2 -]+p. (4) 



Second, we assign states so that the feedback functions 
implementing the current-to-next state mapping depend on the 
minimum number of state variables. It is known that a Boolean 
function of k variables needs 0(2 k /k) gates to be implemented 
(Shannon-Lupanov bound) lF27l . Feedback functions of binary 
machines are random functions. For random functions, their 
actual size is very close to the upper bound. So, each extra 
variable nearly doubles the size of the function. 

In our method, the feedback functions of an (m +/?)-stage 
binary machine depend on m = \log 2 ^] variables only. In ifTSl , 
the feedback functions can potentially depend on all state 
variables. 

The pseudocode of the presented state assignment algorithm 
is shown as Algorithm[T] The input of the algorithm in a binary 
sequence A = (ao, a i , ■ ■ • , a n) an d me desired degree of paral- 
lelization p. The output is a sequence S = (sq,Si, . . . ,s r -\) of 
binary vectors s; = (s;,o,.s/,i , ■ ■ ■ ,Si,p+m-i) € {0,l} p+m , where 
r = \n/p] and m — \l0g2r~], corresponding to the states of an 
(p+m) -stage binary machine generating A with the degree of 
parallelization p. 

The algorithm partitions A into p-tuples and appends at the 
beginning of each ith p-tuple m extra bits. These extra bits 
correspond to the binary expansion of the ith element of the 
permutation vector II. 

Next, we define a mapping s, H> for all i E {0, 1, . . . ,r — 
2}. Since II is a permutation, each state in the resulting 
sequence of states has a unique next state, so the mapping 
is well-defined. The last state s r _ 1 and each of the 2 p+m — r 
remaining states of the resulting binary (/? + ra)-stage machine 
are mapped to don't cares values. This gives us the possibility 
to specify the functions fo,fi,...,f p + m implementing the 
current-to-next state mapping in a way which minimizes their 
size. Since r < 2 m , we can treat them as functions depending 
on the first m variables only. This is very important, because, 
as we mentioned above, for random functions, the size nearly 
doubles with each extra variable. 

Since, by construction, the first p bits of each state s, 
in S = (so,s\ , . . . ,s r -i) correspond to the ith p-tuple of A, 
the resulting binary machine generates A with the degree of 
parallelization p. 

As an example, let us construct a binary machine which 
generates the following 20-bit binary sequence with the degree 
of parallelization 2: 

A = (0,0, 1, 1,0, 1, 1, 1,0,0, 1,0, 1, 1, 1,0, 1, 1,0,0). 

Since n = 20 and p = 2, we get r — 10 and m = 4. Suppose 
we use the following permutation of (0, 1, ... , 15): 

n= (1,8,4,2,9, 12, 6, 11, 5, 10, 13, 14, 15,7,3,0) 

Then, we get the following sequence of states: 

S = (000100, 10001 1,010001,001011, 100100, 1 10010, 
011011,101110,010111,101000) 

The functions implementing the resulting current-to-next state 



Algorithm 1 Assign states to a binary machine which gener- 
ates an binary sequence A = (ao,a\, . . . ,a n ) with the degree of 
parallelization p. 



r.= \n/p\ 
m := \l0g2r] 

n := (Jto,7ti , . . . , Jt2 m -l ) is a permutation of (0, 1, ... ,2™ — 1) 
Let %i j be the jth element of the binary expansion of 7t;, /' G 
{0,..:,m-l} 

for every i from to r — 1 do 
for every j from to p — 1 do 

S[ j .= Cli*p+ j 

end for 

for every k from to m — 1 do 
end for 

■Si : = {sifi,Si,\ ,Si,p+m-\) 

end for 

Return 5= (so,Si,...,s r -i) 



mapping have the following defining table: 
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where "-" stands for a don't care value. Recall that the 
functions depend of the four variables xs,X4,xj,X2 only. The 
remaining 6 input assignments are mapped to don't cares. We 
can implement the above functions as: 

/ 5 =x 2 ®x 3 

f4=X5 
fl=X3 

f\ =x 2 ®x 4 

fo = {x2®x 3 y®x' 3 x' 3 x' 3 

where "'" stands for a complement. 

It is important to use permutations n which have a low-cost 
implementation. Examples of such permutations are sequences 
of states generated by counters, LFSRs, or NLFSRs with 
simple feedback functions [28|. In the example above, we 
used the sequence of states of an LFSR with the generator 
polynomial 1+x + x 4 . 

C. Best circuit for feedback functions 

The problem of finding a best circuit for a given Boolean 
function is known to be notoriously hard. The exact solutions 
are known only for up to five variable functions [29 1. However, 
there are many powerful heuristic algorithms for multi-level 
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TABLE I 

Results for random sequences of length 2 20 . 
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Fig. 2. Results for random sequences of length 2 J 



circuit optimization which are capable of finding good circuits 
for larger functions l22l . 

We optimize feedback functions using UC Berkeley's tool 
ABC l30l . Our experimental results show that, even for 
random functions, ABC is capable of reducing the size of 
the original, non-optimized circuit by 30% on average. 

V. Experimental Results 

A. Comparison to previous BM synthesis algorithms 

In the first experiment, we compared the presented al- 
gorithm to the algorithm lfl8l . Using both algorithms, we 
constructed binary machines for random sequences of length 
2 20 with a different number of don't care bits. The results 
are summarized in Table [I] and Figure [2] As we can see, 
the presented algorithm is significantly more efficient than the 
algorithm [18] for sequences with many don't cares. For the 
case of 99% don't cares, it outperforms the algorithm |18| by 
93.4%. 

B. Comparison to previous approaches for embedding deter- 
ministic test patterns 

In the second experiment, we compared the presented algo- 
rithms to the bit-flipping approach for embedding deterministic 
test patterns which, in our opinion, is one of the most efficient 
ones 1121 The results presented in this section were obtained 
using our implementation of the bit-flipping algorithm. 

We applied both algorithms to 8 real designs with the 
number of gates varying from 19K to 39K. The results are 
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TABLE II 

Comparison to the bit-flipping approach for maximum achievable stuck-at faults coverage. 
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TABLE III 

Area overhead of the presented approach for different stuck-at fault coverages. 



summarized in Table |TT| We first applied 9000 pseudo-random 
patterns to all designs. Then, we computed the top-off patterns 
required to reach maximum achievable stuck-at faults coverage 
using a commercial ATPG tool. We used bit-flipping and the 
presented algorithms to represent these top-off patterns. As we 
can see from Table [TTJ the presented approach outperforms the 
bit-flipping approach by 24.7% on average. The difference in 
the number of gates required in both approaches can be up to 
51.5%. What is even more important, the area overhead of the 
presented approach goes down as the number of scan chains 
grows. On the contrary, the area overhead of the bit-flipping 
approach goes up (see Figure |3). 

However, in spite of the improvements, the percentage of the 
overall chip area required to store deterministic test patterns 
can be prohibitively high for some designs (see column 6 of 
Table [ffl) . It is known that the size of representation for a 
data is related to the entropy of data OTI . Entropy puts a 
theoretical limit on the size of the minimal representation that 
can be achieved. 

If a lower fault coverage is acceptable, then the area 
overhead can be reduced by exploiting the fact that don't care 
bits are normally unevenly distributed among test patterns. As 
an example, consider the diagram in Figure |4] Each point on 
this diagram shows the number of don't care bits in a test 
pattern of dma benchmark (in total 411 patterns of length 1720 
bits each). These test patterns were generated by a commercial 
ATPG tool with dynamic compaction turned on and random 
fill turned off. They cover 100% of detectable stuck-at faults. 
The total percentage of specified bits is 6.45%. We can see that 
only the first few test patterns are highly specified. If we chop 
off the first 5% of test patterns, the entropy of the remaining 
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Fig. 3. Area overhead as a function of the number of chains. 



patterns reduces twice. Therefore, they can be represented 
with a twice smaller representation than the one required for 
the whole set of test patterns. By using the last 95% of test 
patterns, we can achive 95.7% test coverage for stuck-at faults. 

In Table [Hi] we show that, by using a subset of the 
top-off patterns only, we can reduce the area required for 
their representation more than 3 times in some cases, while 
sacrificing the fault coverage by less than 2%. 

VI. Conclusion 

We presented a new method for embedding deterministic 
test patterns on-chip based on binary machines. The presented 
algorithm for synthesis of binary machines is significantly 
more efficient than previous work, especially for test data with 
many don't cares. Our experimental results on 8 real designs 
show that the proposed approach outperforms the bit-flipping 
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Fig. 4. Distribution of don't care bits in the test patterns of dma benchmark. 



approach by 24.7% on average. We also show that it is possible 
to exploit uneven distribution of don't care bits in test patterns 
to reduce the area required for generating top-off patterns more 
than 3 times with less than 2% decrease in fault coverage. 

We believe that the presented algorithm for synthesis of 
binary machines is quite close to an optimal. What can be 
improved in the proposed method is the strategy for selecting 
a subset of top-off patterns which maximizes the fault coverage 
and minimizes the area overhead. At present, we use a simple 
greedy algorithm which selects top-off patterns based on the 
number of don't care bits and the number of covered faults. A 
more sophisticated approach is likely to bring better results. 

Binary machines can potentially be used for storing com- 
pressed test patterns for on-chip test compression techniques. 
This would eliminate the dependence of test compression on 
ATE memory. We are currently investigating the feasibility of 
such an approach on large industrial designs. 
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